This is a simple example of Ridge Regression using Python and the scikit-learn library.
Ridge Regression is a linear regression technique that includes an additional regularization term to prevent overfitting. It is particularly useful when the features in the dataset are highly correlated or when the number of features is close to or exceeds the number of observations. The regularization term, controlled by the hyperparameter alpha, helps to shrink the coefficients and prevent them from becoming too large.
Key concepts of Ridge Regression:
Ridge Regression is commonly used in situations where multicollinearity among features is present.
Python Source Code:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Generate synthetic data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Ridge Regression model with different alpha values
alpha_values = [0, 1, 10, 100]
plt.figure(figsize=(10, 6))
for alpha in alpha_values:
ridge_model = Ridge(alpha=alpha, random_state=42)
ridge_model.fit(X_train, y_train)
# Predict on the test set
y_pred = ridge_model.predict(X_test)
# Plot the model's predictions
x_range = np.linspace(0, 2, 100).reshape(-1, 1)
y_range = ridge_model.predict(x_range)
plt.plot(x_range, y_range, label=f'Ridge Regression (alpha={alpha})')
# Plot the true data points
plt.scatter(X_test, y_test, color='black', label='True Data Points')
plt.title('Ridge Regression with Different Alpha Values')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
Explanation: